{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hybrid Search\n",
    "In this example we are going to show how to do a hybrid query combining the vector database Milvus and the relational database Postgres. A hybrid query allows you to search based on many parameters and is useful for situations where you have to narrow down your results. In the future, Milvus 2.0 will allow you to perform this type of searching without having to use a secondary relational database. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "\n",
    "In this example we are using randomly generated data. We do this because we are mainly trying to demonstrate the flow of doing a hybrid search. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Requirements\n",
    "\n",
    "| Python Packages | Docker Servers |\n",
    "| --------------- | -------------- |\n",
    "| pymilvus        | Milvus-1.1.0   |\n",
    "| numpy           | Postgres          |\n",
    "|  psycopg2 |\n",
    "|  faker |\n",
    "\n",
    "We have included a `requirements.txt` file in order to easily satisfy the required packages. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Up and Running\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Installing Packages\n",
    "Install the required python packages. If you are on mac and recieve an error downloading psycopg2, please first install postgresql with brew."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pymilvus==1.1.0 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 1)) (1.1.0)\n",
      "Requirement already satisfied: numpy in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 2)) (1.20.2)\n",
      "Requirement already satisfied: psycopg2-binary in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (2.8.6)\n",
      "Requirement already satisfied: psycopg2 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.8.6)\n",
      "Requirement already satisfied: faker in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (8.1.4)\n",
      "Requirement already satisfied: gdown in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from -r requirements.txt (line 6)) (3.13.0)\n",
      "Requirement already satisfied: grpcio>=1.22.0 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from pymilvus==1.1.0->-r requirements.txt (line 1)) (1.37.0)\n",
      "Requirement already satisfied: ujson>=2.0.0 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from pymilvus==1.1.0->-r requirements.txt (line 1)) (4.0.2)\n",
      "Requirement already satisfied: requests>=2.22.0 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from pymilvus==1.1.0->-r requirements.txt (line 1)) (2.25.1)\n",
      "Requirement already satisfied: grpcio-tools>=1.22.0 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from pymilvus==1.1.0->-r requirements.txt (line 1)) (1.37.0)\n",
      "Requirement already satisfied: six>=1.5.2 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from grpcio>=1.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (1.15.0)\n",
      "Requirement already satisfied: protobuf<4.0dev,>=3.5.0.post1 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from grpcio-tools>=1.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (3.15.8)\n",
      "Requirement already satisfied: setuptools in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from grpcio-tools>=1.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (52.0.0.post20210125)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from requests>=2.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (2.10)\n",
      "Requirement already satisfied: chardet<5,>=3.0.2 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from requests>=2.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (4.0.0)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from requests>=2.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (1.26.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from requests>=2.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (2020.12.5)\n",
      "Requirement already satisfied: python-dateutil>=2.4 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from faker->-r requirements.txt (line 5)) (2.8.1)\n",
      "Requirement already satisfied: text-unidecode==1.3 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from faker->-r requirements.txt (line 5)) (1.3)\n",
      "Requirement already satisfied: filelock in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from gdown->-r requirements.txt (line 6)) (3.0.12)\n",
      "Requirement already satisfied: tqdm in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from gdown->-r requirements.txt (line 6)) (4.60.0)\n",
      "Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /Users/filiphaltmayer/opt/miniconda3/envs/coinbase_demo/lib/python3.8/site-packages (from requests>=2.22.0->pymilvus==1.1.0->-r requirements.txt (line 1)) (1.7.1)\n"
     ]
    }
   ],
   "source": [
    "# ! brew install postgresql\n",
    "! pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Starting Milvus Server\n",
    "\n",
    "This demo uses Milvus 1.1.0, please refer to the [Install Milvus](https://milvus.io/docs/v1.1.0/install_milvus.md) guide to learn how to use this docker container. For this example we wont be mapping any local volumes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7b01eff74e3762367bda3b4cc7b1ae60a84f1641f7203cee611bf599aca53b4f\n"
     ]
    }
   ],
   "source": [
    "! docker run --name milvus_cpu_1.1.0 -d \\\n",
    "-p 19530:19530 \\\n",
    "-p 19121:19121 \\\n",
    "milvusdb/milvus:1.1.0-cpu-d050721-5e559c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Starting Postgres Server\n",
    "For now, Milvus doesn't support storing multiple attributes for the data. Because of this we have to use another service to store these attributes and search through them, in this case PostgreSQL. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "eb5bb315e687c745c2986c0894a1aff5db0fa12d30b305a30072df611bcf1372\n"
     ]
    }
   ],
   "source": [
    "! docker run --name postgres -d  -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Confirm Running Servers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    __  _________ _   ____  ______    \n",
      "   /  |/  /  _/ /| | / / / / / __/    \n",
      "  / /|_/ // // /_| |/ / /_/ /\\ \\    \n",
      " /_/  /_/___/____/___/\\____/___/     \n",
      "\n",
      "Welcome to use Milvus!\n",
      "Milvus Release version: v1.1.0, built at 2021-05-06 14:50.43, with OpenBLAS library.\n",
      "You are using Milvus CPU edition\n",
      "Last commit id: 5e559cd7918297bcdb55985b80567cb6278074dd\n",
      "\n",
      "Loading configuration from: /var/lib/milvus/conf/server_config.yaml\n",
      "WARNNING: You are using SQLite as the meta data management, which can't be used in production. Please change it to MySQL!\n",
      "Supported CPU instruction sets: avx2, sse4_2\n",
      "FAISS hook AVX2\n",
      "Milvus server started successfully!\n"
     ]
    }
   ],
   "source": [
    "! docker logs milvus_cpu_1.1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "********************************************************************************\n",
      "WARNING: POSTGRES_HOST_AUTH_METHOD has been set to \"trust\". This will allow\n",
      "         anyone with access to the Postgres port to access your database without\n",
      "         a password, even if POSTGRES_PASSWORD is set. See PostgreSQL\n",
      "         documentation about \"trust\":\n",
      "         https://www.postgresql.org/docs/current/auth-trust.html\n",
      "         In Docker's default configuration, this is effectively any other\n",
      "         container on the same system.\n",
      "\n",
      "         It is not recommended to use POSTGRES_HOST_AUTH_METHOD=trust. Replace\n",
      "         it with \"-e POSTGRES_PASSWORD=password\" instead to set a password in\n",
      "         \"docker run\".\n",
      "********************************************************************************\n",
      "The files belonging to this database system will be owned by user \"postgres\".\n",
      "This user must also own the server process.\n",
      "\n",
      "The database cluster will be initialized with locale \"en_US.utf8\".\n",
      "The default database encoding has accordingly been set to \"UTF8\".\n",
      "The default text search configuration will be set to \"english\".\n",
      "\n",
      "Data page checksums are disabled.\n",
      "\n",
      "fixing permissions on existing directory /var/lib/postgresql/data ... ok\n",
      "creating subdirectories ... ok\n",
      "selecting dynamic shared memory implementation ... posix\n",
      "selecting default max_connections ... 100\n",
      "selecting default shared_buffers ... 128MB\n",
      "selecting default time zone ... Etc/UTC\n",
      "creating configuration files ... ok\n",
      "running bootstrap script ... ok\n",
      "performing post-bootstrap initialization ... ok\n",
      "syncing data to disk ... ok\n",
      "\n",
      "\n",
      "Success. You can now start the database server using:\n",
      "\n",
      "    pg_ctl -D /var/lib/postgresql/data -l logfile start\n",
      "\n",
      "initdb: warning: enabling \"trust\" authentication for local connections\n",
      "You can change this by editing pg_hba.conf or using the option -A, or\n",
      "--auth-local and --auth-host, the next time you run initdb.\n",
      "waiting for server to start....2021-05-19 18:03:10.417 UTC [49] LOG:  starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit\n",
      "2021-05-19 18:03:10.420 UTC [49] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\n",
      "2021-05-19 18:03:10.427 UTC [50] LOG:  database system was shut down at 2021-05-19 18:03:10 UTC\n",
      "2021-05-19 18:03:10.431 UTC [49] LOG:  database system is ready to accept connections\n",
      " done\n",
      "server started\n",
      "\n",
      "/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*\n",
      "\n",
      "waiting for server to shut down....2021-05-19 18:03:10.572 UTC [49] LOG:  received fast shutdown request\n",
      "2021-05-19 18:03:10.575 UTC [49] LOG:  aborting any active transactions\n",
      "2021-05-19 18:03:10.577 UTC [49] LOG:  background worker \"logical replication launcher\" (PID 56) exited with exit code 1\n",
      "2021-05-19 18:03:10.577 UTC [51] LOG:  shutting down\n",
      "2021-05-19 18:03:10.594 UTC [49] LOG:  database system is shut down\n",
      " done\n",
      "server stopped\n",
      "\n",
      "PostgreSQL init process complete; ready for start up.\n",
      "\n",
      "2021-05-19 18:03:10.695 UTC [1] LOG:  starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit\n",
      "2021-05-19 18:03:10.696 UTC [1] LOG:  listening on IPv4 address \"0.0.0.0\", port 5432\n",
      "2021-05-19 18:03:10.696 UTC [1] LOG:  listening on IPv6 address \"::\", port 5432\n",
      "2021-05-19 18:03:10.699 UTC [1] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\n",
      "2021-05-19 18:03:10.704 UTC [68] LOG:  database system was shut down at 2021-05-19 18:03:10 UTC\n",
      "2021-05-19 18:03:10.708 UTC [1] LOG:  database system is ready to accept connections\n"
     ]
    }
   ],
   "source": [
    "! docker logs postgres"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code Overview\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connecting to Servers\n",
    "We first start off by connecting to the servers. In this case they are all docker containers and are running on localhost with their default ports."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Connectings to Milvus and Postgres\n",
    "\n",
    "import milvus\n",
    "import psycopg2\n",
    "\n",
    "milv = milvus.Milvus(host='localhost', port='19530')\n",
    "conn = psycopg2.connect(host='localhost', port='5432', user='postgres', password='postgres')\n",
    "cursor = conn.cursor()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating the Collection \n",
    "\n",
    "The next step is to create the collection in Milvus in order to store the vectors. We need to specify the parameters `collection_name`, `dimension`, `index_file_size`, and `metric_type` when creating it. In this case we are storing 128-dimensional vectors and using the Euclidean distance. Our data segments are also set to the default 1024MB. \n",
    "\n",
    "In this case we are also deleting the collection so that we have a fresh start each time this notebook is loaded up. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Status(code=0, message='Create collection successfully!')\n"
     ]
    }
   ],
   "source": [
    "collection_name = 'hybrid_search'\n",
    "VEC_DIM = 128\n",
    "\n",
    "milv.drop_collection(collection_name)\n",
    "\n",
    "param = {\n",
    "            'collection_name': collection_name,\n",
    "            'dimension': VEC_DIM,\n",
    "            'index_file_size':1024,\n",
    "            'metric_type':milvus.MetricType.L2\n",
    "        }\n",
    "status, ok = milv.has_collection(collection_name)\n",
    "\n",
    "if not ok:\n",
    "    status = milv.create_collection(param)\n",
    "    print(status)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating the Index\n",
    "Currently, a collection only supports one index type. In this case we are using the ivf_sq8 index. Since this index is an ivf index, we must provide the parameter `nlist`. This parameter tells milvus how many clusters to make in each index file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(collection_name='hybrid_search', index_type=<IndexType: IVF_SQ8>, params={'nlist': 16384})\n"
     ]
    }
   ],
   "source": [
    "index_param = {\n",
    "    'nlist': 16384\n",
    "}\n",
    "status = milv.create_index(collection_name, milvus.IndexType.IVF_SQ8, index_param)\n",
    "status, index = milv.get_index_info(collection_name)\n",
    "print(index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating the Table in Postgres  \n",
    "PostgreSQL will be used to store the Milvus ID and its corresponding attributes. Here is a description of the attributes:\n",
    "- `sex`:\t   Define the sex of the human: male or female.\n",
    "- `age`:\t Specifies the age of the human: 1-99\n",
    "- `has_glasses`: \tDefines if the human face wears glasses: True or False."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_pg_table(conn,cursor,table_name):\n",
    "    try:       \n",
    "        sql = \"CREATE TABLE \" + table_name + \" (ids bigint, sex char(10), age smallint, has_glasses boolean);\"\n",
    "        cursor.execute(sql)\n",
    "        conn.commit()\n",
    "        print(\"Created postgres table!\")\n",
    "    except:\n",
    "        print(\"Can't create postgres table.\")\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Before creating the table we are clearing any existing tables. This is done in order to have a clean run each time when loading this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created postgres table!\n"
     ]
    }
   ],
   "source": [
    "table_name ='hybrid_search'\n",
    "drop_table = \"DROP TABLE IF EXISTS \" + table_name\n",
    "\n",
    "cursor.execute(drop_table)\n",
    "conn.commit()\n",
    "\n",
    "create_pg_table(conn, cursor, table_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Processing and Storing the Data\n",
    "For this example we are going to be using randomly generated data to simulate a users situation. We are going to randomly assign sex, age, and if they wear glasses to randomly generated vectors. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Generating Embeddings \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def generate_data(amount):\n",
    "    embed = np.random.rand(amount, VEC_DIM).astype('float32')\n",
    "    return embed\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Storing Data by ID in Postgres\n",
    "For this example we are loading in the IDs and attributes through chunks. For each chunk, we write to a .csv file and then write that csv file to the Postgres server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "from faker import Faker\n",
    "import os\n",
    "fake = Faker()\n",
    "\n",
    "def record_txt(ids,fname):\n",
    "    with open(fname,'w+') as f:\n",
    "        for i in range(len(ids)):\n",
    "            sex = random.choice(['female','male'])\n",
    "            age = random.randint(1,99)\n",
    "            has_glasses = random.choice(['True','False'])\n",
    "            line = str(ids[i]) + \"|\" + sex + \"|\" + str(age) + \"|\" + str(has_glasses) + \"\\n\"\n",
    "            f.write(line)\n",
    "            \n",
    "def copy_data_to_pg(conn, cursor,fname ,table_name):\n",
    "    fname = os.path.join(os.getcwd(),fname)\n",
    "    try:\n",
    "        sql = \"COPY \" + table_name + \" FROM STDIN DELIMITER '|' CSV HEADER\"\n",
    "        cursor.copy_expert(sql, open(fname, \"r\"))\n",
    "        conn.commit()\n",
    "        \n",
    "    except Exception as e:\n",
    "        conn.rollback()\n",
    "        print(\"copy data to postgres failed: \", e)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Inserting into Milvus and Postgres\n",
    "When inserting the data into Milvus and Postgres, we push the vectors by chunks of size `BASE_LEN`. Milvus and Postgres perform better when doing batch inserts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Insert Step: 1/10\n",
      "Insert Step: 2/10\n",
      "Insert Step: 3/10\n",
      "Insert Step: 4/10\n",
      "Insert Step: 5/10\n",
      "Insert Step: 6/10\n",
      "Insert Step: 7/10\n",
      "Insert Step: 8/10\n",
      "Insert Step: 9/10\n",
      "Insert Step: 10/10\n"
     ]
    }
   ],
   "source": [
    "filen = 't.csv'\n",
    "VEC_NUM = 10000\n",
    "BASE_LEN = 1000\n",
    "count = 0\n",
    "while count < (VEC_NUM // BASE_LEN):\n",
    "    vectors = generate_data(BASE_LEN)\n",
    "    vectors_ids = [id for id in range(count*BASE_LEN,(count+1)*BASE_LEN)]\n",
    "    status, ids = milv.insert(collection_name=collection_name, records=vectors, ids=vectors_ids)\n",
    "    record_txt(ids,filen)\n",
    "    copy_data_to_pg(conn, cursor,filen ,table_name)\n",
    "    count =count + 1\n",
    "    print(\"Insert Step: \" + str(count) + \"/\" + str(int(VEC_NUM/BASE_LEN)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performing Search\n",
    "Once we have the data all loaded up, we can finally then perform the searches. We begin by first creating a vector to search for. With this vector we first search through Milvues to find the closest vector IDs. We then combine these IDs with the attributes being searched for in order to perform the search in the Postgres server. In this example we are searching the closest vectors that match `sex`, `has_glasses`, and is under `age`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "TOP_K = 10\n",
    "_param = {'nprobe': 64}\n",
    "\n",
    "def search_in_milvus(vector, milvus_connection):\n",
    "    status, results = milvus_connection.search(collection_name = collection_name,query_records=vector, top_k=TOP_K, params=_param)\n",
    "    \n",
    "    return results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "def search_in_pg(conn,cursor,result_ids,result_distance,sex,age,glasses):\n",
    "    ids = str(result_ids[0])\n",
    "    i = 1\n",
    "    while i < len(result_ids):\n",
    "        ids = ids + \",\" + str(result_ids[i])\n",
    "        i = i + 1\n",
    "    sql = \"select * from \" + table_name + \" where ids in (\" + ids + \")\" + \"and age <=\" + str(age) + \" and sex='\" + sex + \"' and has_glasses='\" + str(glasses) + \"';\"\n",
    "\n",
    "    try:\n",
    "        cursor.execute(sql)\n",
    "        rows=cursor.fetchall()\n",
    "        return rows\n",
    "    except Exception as e:\n",
    "        print(\"search failed!:\", e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we are querying 4 random vectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_vec = generate_data(4)\n",
    "milvus_results = search_in_milvus(query_vec, milv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After recieving the results from Milvus, we then have to pull out the IDs and Distances from the result in order to search the Postgres server. Finally, all the values are bundled up for each query vector under `hybrid_results`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "sex = \"male\"\n",
    "glasses = \"True\"\n",
    "age = 64\n",
    "\n",
    "\n",
    "hybrid_results = []\n",
    "\n",
    "for single_query in milvus_results:\n",
    "    result_ids, result_distances = [], []\n",
    "    for result_vector in single_query:\n",
    "        result_ids.append(result_vector.id)\n",
    "        result_distances.append(result_vector.distance)\n",
    "        \n",
    "    sql_results = search_in_pg(conn, cursor, result_ids, result_distances, sex, age, glasses)\n",
    "    hybrid_results.append((result_ids, result_distances, sql_results))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Search Results\n",
    "Once we have all the results the only step left is to print out all the results. In this case we order all the results by distance before printing out the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "def merge_rows_distance(full_results):\n",
    "    \n",
    "    rows = full_results[2]\n",
    "    distance = full_results[1]\n",
    "    ids = full_results[0]\n",
    "    \n",
    "    new_results = []\n",
    "    if len(rows)>0:\n",
    "        for row in rows:\n",
    "            index_flag = ids.index(row[0])\n",
    "            temp = [row[0]] + list(row[1:5]) + [distance[index_flag]]\n",
    "            new_results.append(temp)\n",
    "            \n",
    "        new_results = np.array(new_results)\n",
    "        sort_arg = np.argsort(new_results[:,4])\n",
    "        new_results = new_results[sort_arg].tolist()\n",
    "        columns = [\"ids:\", \"sex:\", \"age:\", \"has_glasses:\", \"distance:\"]\n",
    "        \n",
    "        new_results.insert(0, columns)\n",
    "        \n",
    "        col_width = max(len(word) for row in new_results for word in row) + 2  # padding\n",
    "        for row in new_results:\n",
    "            print(\"\".join(word.ljust(col_width) for word in row))\n",
    "        \n",
    "    else:\n",
    "        print(\"no result\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query: 0\n",
      "ids:                sex:                age:                has_glasses:        distance:           \n",
      "4823                male                21                  True                13.637537002563477  \n",
      "7922                male                20                  True                14.373846054077148  \n",
      "------------------------------------------------------------------------------------------------------------------------\n",
      "Query: 1\n",
      "ids:                sex:                age:                has_glasses:        distance:           \n",
      "4036                male                37                  True                14.453821182250977  \n",
      "4106                male                33                  True                14.65188217163086   \n",
      "1027                male                36                  True                14.950194358825684  \n",
      "3077                male                42                  True                15.067806243896484  \n",
      "------------------------------------------------------------------------------------------------------------------------\n",
      "Query: 2\n",
      "ids:                sex:                age:                has_glasses:        distance:           \n",
      "2736                male                27                  True                14.40953254699707   \n",
      "5084                male                63                  True                14.495954513549805  \n",
      "5088                male                50                  True                15.167840003967285  \n",
      "------------------------------------------------------------------------------------------------------------------------\n",
      "Query: 3\n",
      "ids:                sex:                age:                has_glasses:        distance:           \n",
      "2211                male                4                   True                14.640177726745605  \n",
      "------------------------------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "for x in range(len(hybrid_results)):\n",
    "    print(\"Query: \" + str(x))\n",
    "    merge_rows_distance(hybrid_results[x])\n",
    "    print(\"-\"*120)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
